Implementing Matrix Multiplications on the Multi-Core CPU Architectures
نویسندگان
چکیده
Recent commercial microprocessors are concentrating on the multi-core CPU architectures, while most parallel and/or distributed computing methods focus on the multi-CPU architectures. Therefore, there are needs to analyze and adapt traditional parallel algorithms for the new multi-core environments. In this paper, we use matrix multiplications as the target problem, and implemented it using various methods including the traditional serialized and parallel versions using OpenMP and Windows-threads, etc. We measure the execution times for each implementation, to finally analyze their overall performance. The most important factor for the execution time is the efficient use of level-2 caches in the CPU, according to our experimental results. We expect to develop a more efficient implementation method and design a new matrix multiplication method for the multi-core CPU’s. Key–Words: Multi-core CPU, parallel computing, performance analysis.
منابع مشابه
"Wide or tall" and "sparse matrix dense matrix" multiplications
This note explores sparse matrix dense matrix (SMDM) multiplications, useful in block Krylov or block Lanczos methods. SMDM computations are AU , and V A, multiplication of a large sparse matrix m × n matrix A by a matrix V of k rows of length m or a matrix U of k columns of length k, k << m, k << n . In a block Lanczos or Krylov algorithm, matrix matrix multiplications with the ”tall” U and ”w...
متن کاملFinite element assembly strategies on multi- and many-core architectures
We demonstrate that radically differing implementations of finite element methods are needed on multicore (CPU) and many-core (GPU) architectures, if their respective performance potential is to be realised. Our experimental investigations using a finite element advection-diffusion solver show that increased performance on each architecture can only be achieved by committing to specific and div...
متن کاملMany-body quantum chemistry on graphics processing units
Heterogeneous nodes composed of a multicore CPU and at least one graphics processing unit (GPU) are increasingly common in high-performance scientific computing, and significant programming effort is currently being undertaken to port existing scientific algorithms to these unique architectures. We present implementations for two many-body quantum chemistry methods on heterogeneous nodes: the c...
متن کاملSparse-matrix vector multiplication on hybrid CPU+GPU platform
Sparse-matrix vector multiplication(Spmv) is a basic operation in many linear algebra kernels.So it is interesting to have a spmv on modern architectures like GPU. As it is a irregular computation CPU also performs compares to GPU. So it is interesting to have this routine in hybrid architectures like CPU+GPU.So we have designed a hybrid algorithm for Spmv which uses a CPU and a GPU. We have ex...
متن کاملA Data-Parallel Algorithmic Modelica Extension for Efficient Execution on Multi-Core Platforms
New multi-core CPU and GPU architectures promise high computational power at a low cost if suitable computational algorithms can be developed. However, parallel programming for such architectures is usually non-portable, low-level and error-prone. To make the computational power of new multi-core architectures more easily available to Modelica modelers, we have developed the ParModelica algorit...
متن کامل